Generating Reproducible Statistical Analyses and Evaluation Reports: Principles, Practices, and Free Software Tools

Demonstration at Evaluation 2024: Amplifying and Empowering Voices, annual conference of the American Evaluation Association, Portland, OR

Steven J. Pierce

Center for Statistical Training and Consulting, Michigan State University

2024-10-22

Outline

  • What do you mean by reproducible?
  • Why should we aim for reproducibility?
  • How do we achieve reproducibility?

Reproducible Research (RR)

… is achieved when investigators share all the materials required to exactly recreate the findings so that others can verify them or conduct alternative analyses.

Statistical Results Can Be:

Repeatable

  • Original analyst
  • Original data

Reproducible

  • New analyst
  • Original data

Replicable

  • New analyst
  • New data

RR is a Product

RRProduct Quant Quantitative Methods Mixed Mixed Methods Quant->Mixed RR Reproducible Research Quant->RR Qual Qualitative Methods Qual->Mixed Qual->RR Mixed->RR

RR is a product of how we work, not which methods we use.

Methods Matter!

Continuum Low Low High High Low->High Trustworthiness & Credibility

Irreproducible < Reproducible < Replicated

Important

Reproducibility is an attainable minimum standard for science[1].

Guiding Principles for Evaluators

Pursuing reproducibility enacts our guiding principles[2]:

AEA logo.

  • Systematic inquiry
  • Competence
  • Integrity

Funders Value RR

Data sharing and reproducibility initiatives

Emerging Scientific Norms

Publishing Technology

PubTech cluster_0 Print Distribution cluster_1 Online Distribution Easy, low cost, no page limits! Au Author Ar Archive Au->Ar Jo Journal Jo->Au Pa Manuscript Jo->Pa SF Manuscript & Supplemental Files Jo->SF CD Codebooks, Data, & Software Ar->CD

Career Benefits of RR

  • Motivation to focus on quality
  • Become more efficient
  • Create more products
  • Easier to get published
  • Get cited more often
  • Build your reputation

Materials Required to Recreate Findings

Materials Findings
Manuals & procedures Statistics
Instruments & scoring rules Coefficients & p-values
Codebooks Confidence intervals
Methods applied Effect sizes
Data mgt decisions Model fit indices
Data files Figures
Software & analysis scripts Tables

Principles for Achieving Reproducibility

  • Collaboration
  • Organization
  • Automation
  • Preservation
  • Integration
  • Separation

Workflow Woes

  • Use GUI, menus, & dialog boxes to do tasks
  • Manually updating data files
  • Version control through saving to new file names
  • Disorganized folders & files
  • No audit trail
  • Copy & paste output from stats software to word processor
  • Fixing mistakes took lots of time

Dynamic Documents Via Quarto + R

DynDoc Data Study_Data.csv (Raw Data File) Quarto Report.qmd (Quarto Script w/ R code) Data->Quarto Read by Report Report.pdf (Formatted Output) Quarto->Report Render BibTeX references.bib (BibTeX Data File) BibTeX->Quarto Cited in CSL apa.csl (Citation Style Language File) CSL->Quarto Used by

R Packages

… are folders of digital files designed for sharing code & help documentation.[19] They

  • Organize files into a conventional structure
  • Can contain data, meta-data, & other documentation
  • Can contain dynamic documents & rendered output

Tip

Use an R package for a research compendium![18]

Git Repositories

… are folders of files being tracked by Git[17] for version control purposes.[20,21] They:

  • Preserve a history of changes to each tracked file
  • Allow recovering a file’s prior state from the history
  • Can be local or remote (hosted on a server)
  • Can be private or public

Tip

Put your research compendium in a Git repository!

GitHub

GitHub cluster_1 Computer 1 cluster_0 GitHub Server cluster_2 Computer 2 RR OurProject (Remote Main Repository) LR1 OurProject (Local Repository) RR->LR1 Clone/ Pull RR->LR1:ne Push LR2 OurProject (Local Repository) RR->LR2:nw Push RR->LR2 Clone/ Pull

Organization

  • Organize files into a research compendium.
  • Create documentation (README files, vignettes, etc.).
  • Use folder structures, naming conventions,
  • Use data flow diagrams and/or rendering scripts

Automation

  • Use scripts instead of GUIs
  • Make computers do the tedious labor
  • Computers are patient idiots

Preservation

What should you preserve?

  • Data and meta-data
  • Methodology decisions
  • Data cleaning, management, and analysis scripts
  • Key decisions and rationales
  • Source code for scripts
  • Version histories of data, code, output, and deliverables
  • Who changed what & when
  • Software environment & versions used

Integration

  • Attach meta-data directly to data
  • Include codebooks (or scripts to generate them from the data)
  • Integrate narrative text with code via Markup languages
  • Add custom functions and associated help files to research compendium
  • Include references

Separation

  • Data from code
  • Raw data from cleaned data
  • Code from output
  • Draft from production output
  • Version history from current files

Version Control

  • Git tracks how files changed (additions & deletions), who changed them, when, & why
  • Searchable & recoverable version history
  • Eliminates the need to use filename conventions for versioning

Narrative Text

  • Formatting
  • Headings
  • Links

Inline R Code

Tables

Figures

Parameterized Reports

References

1. Peng, R. D., Dominici, F., & Zeger, S. L. (2006). Reproducible epidemiologic research. American Journal of Epidemiology, 163(9), 783–789. https://doi.org/10.1093/aje/kwj093
2. American Evaluation Association. (2018). Guiding principles for evaluators [Web Page]. Author. https://www.eval.org/About/Guiding-Principles
3. Bosnjak, M., Fiebach, C. J., Mellor, D., Mueller, S., O’Connor, D. B., Oswald, F. L., & Sokol, R. I. (2022). A template for preregistration of quantitative research in psychology: Report of the Joint Psychological Societies Preregistration Task Force. American Psychologist, 77(4), 602–615. https://doi.org/10.1037/amp0000879
4. DeCoster, J., Sparks, E. A., Sparks, J. C., Sparks, G. G., & Sparks, C. W. (2015). Opportunistic biases: Their origins, effects, and an integrated solution. American Psychologist, 70(6), 499–514. https://doi.org/10.1037/a0039191
5. Moore, D. A. (2016). Preregister if you want to. American Psychologist, 71(3), 238–239. https://doi.org/10.1037/a0040195
6. Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA publications and communications board task force report. American Psychologist, 73(1), 3–25. https://doi.org/10.1037/amp0000191
7. Moher, D., Hopewell, S., Schulz, K. F., Montori, V., Gøtzsche, P. C., Devereaux, P. J., Elbourne, D., Egger, M., & Altman, D. G. (2010). CONSORT 2010 explanation and elaboration: Updated guidelines for reporting parallel group randomized trials. Journal of Clinical Epidemiology, 63(8), e1–e37. https://doi.org/10.1016/j.jclinepi.2010.03.004
8. Schulz, K. F., Altman, D. G., & Moher, D. (2010). CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials. Journal of Clinical Epidemiology, 63(8), 834–840. https://doi.org/10.1016/j.jclinepi.2010.02.005
9. von Elm, E., Altman, D. G., Egger, M., Pocock, S. J., Gøtzsche, P. C., & Vandenbroucke, J. P. (2007). The strengthening the reporting of observational studies in epidemiology (STROBE) statement: Guidelines for reporting observational studies. PLoS Medicine, 4(10), e296. https://doi.org/10.1371/journal.pmed.0040296
10. Hrynaszkiewicz, I., & Altman, D. G. (2009). Towards agreement on best practice for publishing raw clinical trial data. Trials, 10(1), 17. https://doi.org/10.1186/1745-6215-10-17
11. Hrynaszkiewicz, I., Norton, M. L., Vickers, A. J., & Altman, D. G. (2010). Preparing raw clinical data for publication: Guidance for journal editors, authors, and peer reviewers. Trials, 11(1), 9. https://doi.org/10.1186/1745-6215-11-9
12. Laine, C., Goodman, S. N., Griswold, M. E., & Sox, H. C. (2007). Reproducible research: Moving toward research the public can really trust. Annals of Internal Medicine, 146(6), 450–453. https://doi.org/10.7326/0003-4819-146-6-200703200-00154
13. Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics, 10(3), 405–408. https://doi.org/10.1093/biostatistics/kxp014
14. R Development Core Team. (2024). R: A language and environment for statistical computing (Version 4.4.1) [Computer Program]. R Foundation for Statistical Computing. http://www.R-project.org.
15. RStudio Team. (2024). RStudio Desktop: Integrated development environment for R (Version 2024.04.2+764) [Computer Program]. Posit Software, PBC. https://posit.co
16. Allaire, J. J., Dervieux, C., Scheidegger, C., Teague, C., & Xie, Y. (2024). Quarto (Version 1.5.57) [Computer Program]. Posit Software, PBC. https://quarto.org
17. Torvalds, L., Hamano, J. C., & other contributors to the Git Project. (2024). Git for Windows (Version 2.46.0(1)) [Computer Program]. Software Freedom Conservancy. https://git-scm.com
18. Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging data analytical work reproducibly using r (and friends). The American Statistician, 72(1), 80–88. https://doi.org/10.1080/00031305.2017.1375986
19. Wickham, H., & Bryan, J. (2021). R packages: Organize, test, document, and share your code. O’Reilly Media. https://r-pkgs.org
20. Bryan, J. (2018). Excuse me, do you have a moment to talk about version control? The American Statistician, 72(1), 20–27. https://doi.org/10.1080/00031305.2017.1399928
21. Chacon, S., & Straub, B. (2014). Pro Git. Apress Media. https://git-scm.com/book/en/v2